Assessing protein coding region integrity in cDNA sequencing projects
نویسندگان
چکیده
MOTIVATION In cDNA sequencing projects, it is vital to know whether the protein coding region of a sequence is complete, or whether errors have occurred during library construction. Here we present a linear discriminant approach that predicts this completeness by estimating the probability of each ATG being the initiation codon. RESULTS Because of the current shortage of full-length cDNA data on which to base this work, tests were performed on a non-redundant set of 660 initiation codon-containing DNA sequences that had been conceptually spliced into mRNA/cDNA. We also used an edited set of the same sequences that only contained the region following the initiation codon as a negative control. Using the criterion that only a single prediction is allowed for each sequence, a cut-off was selected at which discrimination of both positive and negative sets was equal. At this cut-off, 67% of each set could be correctly distinguished, with the correct ATG codon also being identified in the positive set. Reliability could be increased further by raising the cut-off or including homologues, the relative merits of which are discussed. AVAILABILITY The prediction program, called ATGpr, and other data are available at http://www.hri.co.jp/atgpr CONTACT [email protected]
منابع مشابه
Characterization of cDNA clones selected by the GeneMark analysis from size-fractionated cDNA libraries from human brain.
We have conducted a sequencing project of human cDNAs which encode large proteins in brain. For selection of cDNA clones to be sequenced in this project, cDNA clones have been experimentally examined by in vitro transcription/translation prior to sequencing. In this study, we tested an alternative approach for picking up cDNA clones having a high probability of carrying protein coding region. T...
متن کاملTranscriptome Sequencing of Guilan Native Cow in Comparison with bosTau4 Reference Genome
RNA-sequencing is a new method of transcriptome characterization of organisms. Based on identity and relatedness, there are large genetic variations among different cattle breeds. The goal of the current study was to sequence the transcriptome of Guilan native cow and compare with available reference genome using RNA-sequencing method. Blood samples were collected from 14 Guilan native cows and...
متن کاملCloning and Characterization of cbhII Gene fromTrichoderma parceramosum and Its Expressionin Pichia pastoris
The genomic and cDNA clones encoding cellobiohydrolase II (CBHII) have been isolated and sequenced from a native Iranian isolate of Trichoderma parceramosum, a high cellulolytic enzymes producer isolate. This represents the first report of cbhII gene from this organism. Comparison of genomic and cDNA sequences indicates this gene contains three short introns and also an open reading frame codin...
متن کاملPrediction of the coding sequences of mouse homologues of KIAA gene: IV. The complete nucleotide sequences of 500 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries.
We have been conducting a mouse cDNA project to predict protein-coding sequences of mouse homologues of human KIAA and FLJ genes since 2001. As an extension of these projects, we herein present the entire sequences of 500 mKIAA cDNA clones and 4 novel cDNA clones that were incidentally identified during this project. We have isolated cDNA clones from the size-fractionated mouse cDNA libraries d...
متن کاملCloning & Expression of F Protein Gene (HR1 region) of Newcastle Disease Virus NR43 Isolate from Iran in E.coli
Background and Aims: NDV (Newcastle Disease Virus) is one of the viruses that cause disease in avian with severe economic losses in the poultry industry in many countries. Fusion protein (F) which plays a major role in the virus pathogenicity contains several regions that have a role in the fusion process. Mutation in the sequence of HR1 & HR2 regions of this protein prevents fusion of the viru...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 14 5 شماره
صفحات -
تاریخ انتشار 1998